ANDERSON_KSAMP

Overview

The ANDERSON_KSAMP function performs the k-sample Anderson-Darling test, a non-parametric statistical test that determines whether two or more samples are drawn from the same population distribution. Unlike parametric tests, it does not require specifying the underlying distribution, making it particularly versatile for exploratory data analysis.

The Anderson-Darling test is an extension of the classic one-sample Anderson-Darling goodness-of-fit test, adapted for comparing multiple samples. It tests the null hypothesis that all k samples originate from a common, unspecified distribution. If the test statistic exceeds a critical value (or the p-value falls below the significance level), the null hypothesis is rejected, suggesting the samples come from different distributions.

This implementation uses SciPy’s anderson_ksamp function from the scipy.stats module. The test is based on the methodology described by Scholz and Stephens (1987) in their paper “K-Sample Anderson-Darling Tests” published in the Journal of the American Statistical Association.

The function returns a normalized test statistic along with critical values corresponding to significance levels of 25%, 10%, 5%, 2.5%, 1%, 0.5%, and 0.1%. The p-value is interpolated from tabulated values and is floored at 0.1% and capped at 25%. To interpret the results, compare the test statistic against the critical values: if the statistic exceeds the critical value for a given significance level, the null hypothesis can be rejected at that level.

The midrank parameter controls which variant of the test is applied. When set to TRUE (the default), the midrank empirical distribution function is used, which is appropriate for both continuous and discrete data. When set to FALSE, the right-side empirical distribution is used, which is designed specifically for discrete data where ties may occur between samples.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=ANDERSON_KSAMP(samples, midrank)

samples (list[list], required): Table where each column is a sample group, and each row is an observation. Must have at least two columns and two rows per column.
midrank (bool, optional, default: true): If TRUE, uses the midrank test (recommended for continuous and discrete data). If FALSE, uses the right side empirical distribution for discrete data.

Returns (list[list]): 2D list [[stat, p, critical_values…]], or error string.

Examples

Example 1: Demo case 1

Inputs:

samples			midrank
1.1	2.2	3.3	true
1.2	2.1	3.4

Excel formula:

=ANDERSON_KSAMP({1.1,2.2,3.3;1.2,2.1,3.4}, TRUE)

Expected output:

Result
-0.9399	0.25	0.325	1.226	1.961	2.718	3.752	4.592	6.546

Example 2: Demo case 2

Inputs:

samples			midrank
1.1	2.2	3.3	true
1.2	2.1	3.4
1.3	2.3	3.1

Excel formula:

=ANDERSON_KSAMP({1.1,2.2,3.3;1.2,2.1,3.4;1.3,2.3,3.1}, TRUE)

Expected output:

Result
-1.3062	0.25	0.4493	1.3053	1.9434	2.577	3.4163	4.0721	5.5642

Example 3: Demo case 3

Inputs:

samples			midrank
1.1	2.2	3.3	false
1.2	2.1	3.4

Excel formula:

=ANDERSON_KSAMP({1.1,2.2,3.3;1.2,2.1,3.4}, FALSE)

Expected output:

Result
-0.8673	0.25	0.325	1.226	1.961	2.718	3.752	4.592	6.546

Example 4: Demo case 4

Inputs:

samples			midrank
1.1	2.2	3.3	false
1.2	2.1	3.4
1.3	2.3	3.1

Excel formula:

=ANDERSON_KSAMP({1.1,2.2,3.3;1.2,2.1,3.4;1.3,2.3,3.1}, FALSE)

Expected output:

Result
-1.2389	0.25	0.4493	1.3053	1.9434	2.577	3.4163	4.0721	5.5642

Python Code

import warnings
from scipy.stats import anderson_ksamp as scipy_anderson_ksamp

def anderson_ksamp(samples, midrank=True):
    """
    Performs the k-sample Anderson-Darling test to determine if samples are drawn from the same population.

    See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.anderson_ksamp.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        samples (list[list]): Table where each column is a sample group, and each row is an observation. Must have at least two columns and two rows per column.
        midrank (bool, optional): If TRUE, uses the midrank test (recommended for continuous and discrete data). If FALSE, uses the right side empirical distribution for discrete data. Default is True.

    Returns:
        list[list]: 2D list [[stat, p, critical_values...]], or error string.
    """
    # Validate samples
    if not isinstance(samples, list) or len(samples) < 2:
        return "Invalid input: samples must be a 2D list with at least two columns (sample groups)."
    if any(not isinstance(col, list) or len(col) < 2 for col in samples):
        return "Invalid input: each sample group must be a list with at least two values."
    try:
        # Transpose columns to rows for scipy
        transposed = [list(col) for col in samples]
        # Check for non-numeric values
        for group in transposed:
            for v in group:
                if not isinstance(v, (int, float)):
                    return "Invalid input: all sample values must be numeric."
    except Exception:
        return "Invalid input: samples must be a 2D list of floats."
    try:
        with warnings.catch_warnings():
            warnings.filterwarnings('ignore', message='p-value capped')
            result = scipy_anderson_ksamp(transposed, midrank=midrank)
    except Exception as e:
        return f"scipy.stats.anderson_ksamp error: {e}"
    # Compose output row
    output = [
        float(result.statistic),
        float(result.pvalue),
        float(result.critical_values[0]),
        float(result.critical_values[1]),
        float(result.critical_values[2]),
        float(result.critical_values[3]),
        float(result.critical_values[4]),
        float(result.critical_values[5]),
        float(result.critical_values[6])
    ]
    # Check for nan/inf
    if any([
        isinstance(x, float) and (x != x or x == float('inf') or x == float('-inf'))
        for x in output
    ]):
        return "Invalid output: statistic or critical values are not finite."
    return [output]

Overview

Excel Usage

Examples

Python Code

Online Calculator